Efficient Distributed Clustering Algorithms on Star-Schema Heterogeneous Graphs

نویسندگان

چکیده

Many datasets including social media data and bibliographic can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of data. To improve quality clustering, node attributes taken account, resulting in attributed Existing graph clustering methods generally consider attribute similarity structural separately. In this paper, we represent star-schema heterogeneous graphs, where are different types nodes. This enables use personalized pagerank (PPR) a unified distance measure that captures both similarities. We employ DBSCAN for update edge weights iteratively balance importance attributes. The rapidly growing volume nowadays challenges traditional algorithms, thus, distributed method required. Hence, adopt popular computing system Blogel, based on which, develop four exact approximate approaches enable efficient PPR score computation when updated. effectiveness propose simple yet effective weight strategy entropy. addition, present game theory trading efficiency result quality. Extensive experiments real-life offer our proposals.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Clustering on Graphs

This paper provides new algorithms for distributed clustering for two popular center-based objec-tives, k-median and k-means. These algorithms have provable guarantees and improve communicationcomplexity over existing approaches. Following a classic approach in clustering by [13], we reduce theproblem of finding a clustering with low cost to the problem of finding a ‘coreset’ of...

متن کامل

Clustering with Proximity Graphs: Exact and Efficient Algorithms

Graph Proximity Cleansing (GPC) is a string clustering algorithm that automatically detects cluster borders and has been successfully used for string cleansing. For each potential cluster a so-called proximity graph is computed, and the cluster border is detected based on the proximity graph. However, the computation of the proximity graph is expensive and the state-of-the-art GPC algorithms on...

متن کامل

Efficient Evolutionary Algorithms for the Clustering Problem in Directed Graphs

This paper presents improvements in the performance of standard genetic algorithms (GAs) as regards the solution of highly complex combinatorial optimization problems. These improvements are related to some modifications in the GA, including local search and/or diversification procedures. The performance of each proposed version is evaluated through a graph partitioning problem. Extensive compu...

متن کامل

General and Robust Communication-Efficient Algorithms for Distributed Clustering

As datasets become larger and more distributed, algorithms for distributed clustering have become more and more important. In this work, we present a general framework for designing distributed clustering algorithms that are robust to outliers. Using our framework, we give a distributed approximation algorithm for k-means, k-median, or generally any `p objective, with z outliers and/or balance ...

متن کامل

Survey on Variants of Distributed Energy efficient Clustering Protocols in heterogeneous Wireless Sensor Network

Wireless sensor networks are composed of low cost and extremely power constrained sensor nodes which are scattered over a region forming self organized networks, making energy consumption a crucial design issue. Thus, finite network lifetime is widely regarded as a fundamental performance bottleneck. These networks are used for various applications such as field monitoring, home automation, med...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering

سال: 2022

ISSN: ['1558-2191', '1041-4347', '2326-3865']

DOI: https://doi.org/10.1109/tkde.2020.3047631